Routing by Discriminant Projection : TREC - 4 Kok

نویسندگان

  • Kok F. Lai
  • Vincent A.S. Lee
  • Jeremy P. Chew
چکیده

We present document routing as a standard problem in discriminant analysis. The standard solution involves the inversion of a large matrix whose dimension is the number of indexed terms. Typically, the solution does not exist because the number of training documents are much smaller compared to the number of terms. We show that one can project this raw document space into a lower dimensional space where solution is possible. Our projection algorithm exploits the characterisitics of the empty space, using only the training documents for eecient coding of the relevance information. Its complexity is linear with respect to the number of terms, and second order with respect to the number of training documents. We can therefore fully exploit the power of discriminant analysis without imposing severe computational and storage constraints.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Experiments on Routing, Filtering and Chinese Text Retrieval in TREC-5

We describes our experiments in the routing, ltering and Chinese text retrieval. We based our routing and ltering experiments on our discriminant project algorithm. The algorithm sequentially constructs a series of orthogonal axis from the training documents using the Gram-Schmidt procedure. It then rotates the resulting subspace using principal component analysis so that the axis are ordered b...

متن کامل

Routing as Statistical Classi cation

In this paper, we compare learning techniques based on statistical classiication to traditional methods of relevance feedback for the document routing problem. We consider three classiication techniques which have decision rules that are derived via explicit error minimization: linear discriminant analysis, logistic regression , and neural networks. We demonstrate that the classiiers perform 10...

متن کامل

Two-Step Feature Selection and Neural Network Classification for the TREC-8 Routing

At the Caisse des Dépôts et Consignations (CDC), the Agence France-Presse (AFP) news releases are filtered continuously according to the users' interests. Once a user has specified a topic of interest, a filter is customized to fit this user's profile. Until now, these filters would rely on rule-based methods, whose efficiency is proven [Vichot et al., 1999], but which require a large amount of...

متن کامل

New Retrieval Approaches Using SMART: TREC 4

The Smart information retrieval project emphasizes completely automatic approaches to the understanding and retrieval of large quantities of text. We continue our work in TREC 4, performing runs in the routing, ad-hoc, confused text, interactive, and foreign language environments.

متن کامل

Two Steps Feature Selection and Neural Network Classification for the TREC-8 Routing

At the Caisse des Dépôts et Consignations (CDC), the Agence France-Presse (AFP) news releases are filtered continuously according to the users' interests. Once a user has specified a topic of interest, a filter is customized to fit this user's profile. Until now, these filters would rely on rule-based methods, whose efficiency is proven [Vichot et al., 1999], but which require a large amount of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996